AITopics | unimodal bias

Collaborating Authors

unimodal bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RUBi: Reducing Unimodal Biases for Visual Question Answering

Remi Cadene, Corentin Dancette, Hedi Ben younes, Matthieu Cord, Devi Parikh

Neural Information Processing SystemsFeb-12-2026, 04:27:24 GMT

Human Attentionin Answering: Do Humansand Deep Networks Lookatthe Same Regions?

machine learning, natural language, question answering, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.44)

Add feedback

RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing SystemsDec-25-2025, 09:30:52 GMT

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

name change, unimodal bias, vqa model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

RUBi: Reducing Unimodal Biases for Visual Question Answering

Remi Cadene, Corentin Dancette, Hedi Ben younes, Matthieu Cord, Devi Parikh

Neural Information Processing SystemsOct-2-2025, 17:43:26 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, natural language, question answering, (20 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.44)

Add feedback

Reviews: RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing SystemsJan-23-2025, 17:14:50 GMT

Originality: The proposed method is a novel dynamic loss re-weighting technique applied to VQA under changing priors condition, aka VQA-CP, where the train and test sets are deliberately constructed to have different distributions. The related works are adequately cited and discussed. While prior works have also focused on using knowledge from a question-only model to capture unnecessary biases in the dataset [25], the paper differs from [25] in some key aspects. E.g., the proposed model guides the whole model (including the visual encoding branch) to learn "harder" examples better whereas [25] focuses on only reducing bias from question encoding. Quality: The proposed method is sound and well-motivated.

rebuttal, robustness, unimodal bias, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)

Add feedback

Reviews: RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing SystemsJan-23-2025, 17:14:39 GMT

After the authors' rebuttal all reviewers believe the paper makes a significant enough contribution to be accepted to the conference. When there is a need to obtain large amounts of data for complex tasks such as VQA, bias in the labeling process is highly likely. Techniques that improve robustness to such biases can have a significant impact in these cases. The authors should incorporate the clarifications and results from the rebuttal into the paper and address the reviewers comments.

rebuttal, rubi, unimodal bias

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)

Add feedback

RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing SystemsOct-10-2024, 00:47:47 GMT

rubi, unimodal bias, vqa model, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

Chen, Meiqi, Cao, Yixin, Zhang, Yan, Lu, Chaochao

arXiv.org Artificial IntelligenceApr-3-2024

Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from an over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within our framework, we devise a causal graph to elucidate the predictions of MLLMs on VQA problems, and assess the causal effect of biases through an in-depth causal analysis. Motivated by the causal graph, we introduce a novel MORE dataset, consisting of 12,000 VQA instances. This dataset is designed to challenge MLLMs' abilities, necessitating multi-hop reasoning and the surmounting of unimodal biases. Furthermore, we propose two strategies to mitigate unimodal biases and enhance MLLMs' reasoning capabilities, including a Decompose-Verify-Answer (DeVA) framework for limited-access MLLMs and the refinement of open-source MLLMs through fine-tuning. Extensive quantitative and qualitative experiments offer valuable insights for future research. Our project page is at https://opencausalab.github.io/MORE.

dataset, information, mllm, (15 more...)

arXiv.org Artificial Intelligence

2403.18346

Country:

Africa > South Africa (0.04)
North America > United States > Alaska > Denali Borough > Mt Mckinley (0.04)
Europe > France (0.04)
(15 more...)

Genre: Research Report (0.40)

Industry:

Transportation > Ground (0.96)
Automobiles & Trucks > Manufacturer (0.73)
Leisure & Entertainment > Sports > Soccer (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Theory of Unimodal Bias in Multimodal Learning

Zhang, Yedi, Latham, Peter E., Saxe, Andrew

arXiv.org Artificial IntelligenceDec-1-2023

Using multiple input streams simultaneously in training multimodal neural networks is intuitively advantageous, but practically challenging. A key challenge is unimodal bias, where a network overly relies on one modality and ignores others during joint training. While unimodal bias is well-documented empirically, our theoretical understanding of how architecture and data statistics influence this bias remains incomplete. Here we develop a theory of unimodal bias with deep multimodal linear networks. We calculate the duration of the unimodal phase in learning as a function of the depth at which modalities are fused within the network, dataset statistics, and initialization. We find that the deeper the layer at which fusion occurs, the longer the unimodal phase. A long unimodal phase can lead to a generalization deficit and permanent unimodal bias in the overparametrized regime. In addition, our theory reveals the modality learned first is not necessarily the modality that contributes more to the output. Our results, derived for multimodal linear networks, extend to ReLU networks in certain settings. Taken together, this work illuminates pathologies of multimodal learning under joint training, showing that late and intermediate fusion architectures can give rise to long unimodal phases and permanent unimodal bias.

fusion linear network, linear network, modality, (15 more...)

arXiv.org Artificial Intelligence

2312.00935

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Synthetic Misinformers: Generating and Combating Multimodal Misinformation

Papadopoulos, Stefanos-Iordanis, Koutlis, Christos, Papadopoulos, Symeon, Petrantonakis, Panagiotis C.

arXiv.org Artificial IntelligenceMar-2-2023

With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the labor-intensive process of manual annotation, researchers have been exploring various methods for automatically generating synthetic multimodal misinformation - which we refer to as Synthetic Misinformers - in order to train MMD models. However, limited evaluation on real-world misinformation and a lack of comparisons with other Synthetic Misinformers makes difficult to assess progress in the field. To address this, we perform a comparative study on existing and new Synthetic Misinformers that involves (1) out-of-context (OOC) image-caption pairs, (2) cross-modal named entity inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them against real-world misinformation; using the COSMOS benchmark. The comparative study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy and that hybrid approaches can lead to even higher detection accuracy. Nevertheless, after alleviating information leakage from the COSMOS evaluation protocol, low Sensitivity scores indicate that the task is significantly more challenging than previous studies suggested. Finally, our findings showed that NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where text-only MMDs can outperform multimodal ones.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.01217

Country: